Retrieval Augmented Generation (RAG) Application using OpenAI and ElasticSearch as Vector DB

RAG (Retrieval-Augmented Generation) Application to offer Q&A on a long format text using OpenAI and ElasticSearch. Here we are

using ElasticSearch as VectorDB to store indexed data and use it for Retrieval.
using OpenAI to create embeddings and generate answers to the questions after retrieving relevant chuncks from ES.

Getting Started

Prerequisites:

Python 3.9 or above
pip
virtualenv
elasticsearch - 7.14.0 (https://www.elastic.co/downloads/elasticsearch)
OpenAI API Key (https://platform.openai.com/)

Installation Steps:

Create and Activate virtual environment

python -m venv venv

source venv/bin/activate

Install dependencies

pip install -r requirements.txt

Download spacy model for NLP and splitting the text into sentences.

python -m spacy download en_core_web_sm

Elasticsearch Setup (https://www.elastic.co/downloads/elasticsearch)
1. Download and unzip Elasticsearch
2. Start Elasticsearch bin/elasticsearch (or bin\elasticsearch.bat on Windows)
3. Test it by running curl -X GET "localhost:9200/" in terminal

Run App

Before running app, create .env file using .env-template and add OpenAI API Key and Elasticsearch credentials.

python run.py

App will start on localhost on port 8081. Sanity check by running in terminal. It should return Hello message.

curl -X GET "localhost:8081"

To Index and Query Data

1. Create ES Index

curl --location --request PUT 'http://localhost:9200/first-index' \
--header 'Content-Type: application/json' \
--data-raw '{
    "mappings": {
        "properties": {
            "text": {
                "type": "text"
            },
            "embedding": {
                "type": "dense_vector",
                "dims": 1536
            }
        }
    }
}'

If you want to delete the index for clean start, use the following command.

curl --location --request DELETE 'http://localhost:9200/first-index'

2. Index Data

To index data, use the following curl command. This command sends a POST request to the specified endpoint with a JSON payload containing the text to be indexed and the name of the index.

Make sure you have the server running on localhost on port 8081.

curl --location --request POST 'localhost:8081/api/index' \
--header 'Content-Type: application/json' \
--data-raw '{
  "text": "Ajeet is an engineer turned product entrepreneur with experience in AI, SaaS, HealthTech and EdTech. He is a technology enthusiast and loves to work on new technologies. He was a founding member of leading health-tech startups HealthKart and TATA 1mg in India. He was the founder of Joe Hukum, a chatbot platform which was acquired by Freshworks. After Freshworks, he founded Seekho.ai to solve for the skill gap in Indian higher education. Currently, he is on a break and is exploring GenAI to solve for the next meaningful problem. He is passionate about solving zero to one problems and building products that can impact millions of lives.",
  "index_name": "first-index"
}'

3. Query Data

Example 1:

curl --location --request POST 'localhost:8081/api/query' \
--header 'Content-Type: application/json' \
--data-raw '{
  "question": "Who is Ajeet?",
  "index_name": "first-index"
}'

Example 2:

curl --location --request POST 'localhost:8081/api/query' \
--header 'Content-Type: application/json' \
--data-raw '{
  "question": "Joe Hukum was acquired by which company?",
  "index_name": "first-index"
}'

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
app		app
.env-template		.env-template
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Retrieval Augmented Generation (RAG) Application using OpenAI and ElasticSearch as Vector DB

Getting Started

Prerequisites:

Installation Steps:

Run App

To Index and Query Data

1. Create ES Index

2. Index Data

3. Query Data

Built by Ajeet with ☕️ and ❤️

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ajeetsk/openai-elasticsearch-rag

Folders and files

Latest commit

History

Repository files navigation

Retrieval Augmented Generation (RAG) Application using OpenAI and ElasticSearch as Vector DB

Getting Started

Prerequisites:

Installation Steps:

Run App

To Index and Query Data

1. Create ES Index

2. Index Data

3. Query Data

Built by Ajeet with ☕️ and ❤️

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages